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Abstract — We study an interactive live streaming scenario 
where multiple peers pull streams of the same free viewpoint 
video that are synchronized in time but not necessarily in 
view. In free viewpoint video, each user can periodically 
select a virtual view between two anchor camera views for 
display. The virtual view is synthesized using texture and 
depth videos of the anchor views via depth-image-based 
rendering (DIBR). In general, the distortion of the virtual 
view increases with the distance to the anchor views, and 
hence it is beneficial for a peer to select the closest anchor 
views for synthesis. On the other hand, if peers interested in 
different virtual views are willing to tolerate larger distortion 
in using more distant anchor views, they can collectively 
share the access cost of common anchor views. 

Given anchor view access cost and synthesized distortion 
of virtual views between anchor views, we study the opti- 
mization of anchor view allocation for collaborative peers. 
We first show that, if the network reconfiguration costs 
due to view-switching are negligible, the problem can be 
optimally and efficiently solved in polynomial time using 
dynamic programming. We then consider the case of non- 
negligible reconfiguration costs (e.g., large or frequent view- 
switching leading to anchor-view changes). In this case, the 
view allocation problem becomes NP-hard. We thus present a 
locally optimal and centralized allocation algorithm inspired 
by Lloyd's algorithm in non-uniform scalar quantization. 
We also propose a distributed algorithm with guaranteed 
convergence where each peer group independently make 
merge-and-split decisions with a well-defined fairness cri- 
teria. The results show that depending on the problem 
settings, our proposed algorithms achieve respective optimal 
and close-to-optimal performance in terms of total cost, and 
substantially outperform a P2P scheme without collaborative 
anchor selection. 

I. Introduction 

The advent of multiview imaging technologies means 
that videos from different viewpoints of the same 3D 
scene can now be captured simultaneously by a system 
of multiple closely spaced cameras HJ. If depth maps 
(per-pixel distance between camera and physical objects) 
from the same camera viewpoints are also available!] 
then virtual views can be synthesized during video play- 
back using texture and depth maps of the closest cap- 

1 Depth maps can be captured directly through time-of-flight (ToF) 
cameras |2), or indrectly through stereo-matching algorithms. 



tured camera views (i.e., anchor vieivs) via depth-image- 
based rendering (DIBR) |3|. This ability to construct 
and observe any virtual view is called free viewpoint 
video [4], which enables a 3D visual effect known as 
motion parallax [5J: a viewer's detected head movements 
trigger correspondingly shifted video views on his/her 
2D display. It is well known that motion parallax is the 
strongest cue in human's perception of depth in a 3D 
scene |6], enhancing the immersive experience. 

In a live free viewpoint video streaming scenario, 
texture and depth videos from multiple viewpoints in 
the same 3D scene are real-time encoded into separate 
streams at server before delivery to interested peers. The 
clients, organized in a P2P system, can choose to look 
at the recorded anchor views or virtual views that are 
arbitrarily positioned between the anchor views. Because 
the distortion of synthesized view tend to be larger as 
virtual view distance to anchor views increases Q, it 
is beneficial for a viewer to request anchor views that 
tightly "sandwich" the virtual viewpoint he wants to 
look at. On other hand, given that a group of local peers 
can share the access cost of common anchor views, peers 
have incentive to collaboratively select and share the 
same anchor views, even if doing so means that the 
anchor views are further away with a distortion penalty 
in the synthesized views. In this paper, we investigate 
the anchor view allocation problem for collaborative 
streaming of live free viewpoint video under different 
network settings. To the best of our knowledge, this 
is the first piece of work addressing such an issue for 
collaborative streaming of free viewpoint video. 

As a peer changes his interested view u over time, u 
may eventually move outside the viewing range [tr, r? r ] 
delimited by his two current anchor views V and v r . This 
necessitates the system to reallocate new anchor views 
for the peer. If such network reconfiguration costs due 
to peers' view-switching is negligible, we first show that 
the anchor view allocation problem can be efficiently 
and optimally solved in polynomial time using dynamic 
programming (DP). This is true no matter if the anchor 
view access cost from the server to the group of peers is 



formulated as a constraint (i.e., the maximum number of 
anchor views allocated to a peer group cannot be larger 
than a certain number B max ) or as a cost function (i.e., 
each anchor view pulled from the source incurs a certain 
access cost a). 

On the contrary, if the network reconfiguration cost 
is non-negligible due to peers' view-switching, (e.g., in 
the case of large or frequent view-switching by the 
peers), the problem of anchor view allocation becomes 
NP-hard for both formulations of anchor view access 
cost (as a constraint or as a cost function). We thus 
present a locally optimal and centralized allocation algo- 
rithm inspired by the Lloyd's algorithm in non-uniform 
scalar quantization [8 J. Finally, we propose a distributed 
version of the algorithm with guaranteed convergence, 
where each peer group can independently makes merge- 
and-split decisions with a well-defined fairness criteria. 
The results show that our proposed algorithms achieve 
optimal and close-to-optimal performance respectively 
in terms of total cost, and substantially outperform a 
P2P scheme without collaborative anchor selection. 

The outline of the paper is as follows. We first discuss 
related work in Section HU We then overview the live free 
viewpoint video streaming in Section [III] We first formu- 
late the anchor view allocation problem with negligible 
network reconfiguration cost and the corresponding op- 
timal DP algorithm in Section HV] We then formulate our 
problem with reconfiguration cost in Section [V] and show 
it is NP-hard. We then describe locally optimal solutions 
to the problem in Section IVII Finally, we present results 
and conclusion in Section IVIII and IVIIII respectively. 

II. Related Work 

Though much research in multiview video has been 
focusing on compression (e.g., multiview video coding 
(MVC) [9]), streaming strategies and network optimiza- 
tion for multiview video is still a relatively unexplored 
and new research topic. [10 [ discusses an interactive mul- 
tiview video streaming (IMVS) video-on-demand scenario, 
where only a single requested view per client is needed 
at one time during video playback as the client peri- 
odically requests view-switches. It proposes an efficient 
coding structure where a captured image can be encoded 
into multiple versions, so that the appropriate version 
can be transmitted depending on the currently available 
content in decoder's buffer, in order to reduce server 
transmission rate. Later, [11 J leverages on the IMVS 
coding structure for content replication, so that suitable 
versions of multiview video segments can be cached in a 
distributed manner across cooperative network servers. 

Our current work on anchor view allocation differs 
from the above work in that: i) we consider the more 
general free viewpoint video, where, a client can select 
and synthesize any intermediate virtual view between 
two anchor views via DIBR; and ii) we focus on the live 
collaborative streaming scenario, where anchor views can 



be shared among peers that are synchronized in time but 
not necessarily in view. 

There has been a large body of work on peer-to- 
peer (P2P) streaming, addressing different aspects of the 
problem. For example, [12J, [13[ study the structure and 
organization of streaming overlays, while the work of 
[14J, [15 1 discuss the design and deployment of large- 
scale P2P streaming systems through measurement on 
real-world streaming systems. All the previous works 
above study single view streaming, and the results can- 
not be applied to live free viewpoint video streaming, 
where anchor-view selection is a critical and challenging 
issue. 

There has been little work studying multiview stream- 
ing over P2P network. For example, the work of |16| 
proposes a scheduling algorithm that allows peers to 
frequently compute the scheduling of multiview seg- 
ments. [17) studies achieving low view-switch delay by 
organizing viewers with different views together. These 
works essentially treat multiview video as streaming 
of multiple single-view videos, and it is not clear how 
to extend them to live free viewpoint streaming where 
anchor-view selection and its effect on distortion need 
to be considered. To the best of our knowledge, this 
is the first piece of work on collaborative streaming of 
interactive live free viewpoint video. 

III. Collaborative Streaming Model 

A. Network Model 

We model the free viewpoint video distribution net- 
work with two nodes: S is the server node where live 
video streams originate, and P is a single node repre- 
senting a group of local peers with close geographical 
or network distance]! The connection between server S 
and peer group P may be modeled as a hard constraint; 
i.e., the number of anchor views pulled from S by P 
cannot exceed B max . Alternatively, the connection may 
be modeled as a soft constraint; i.e., each anchor view 
pulled by P incurs a cost a in the total cost function. 
The different connection constraints are used later in the 
problem formulation. 

B. Free Viewpoint Video Model 

Let "V = {1,2, V] be a discrete set of captured view- 
points for V equally spaced cameras in a ID array as 
done in [1J and others. Each camera captures both a 
texture (RGB image) and depth map (per-pixel physical 
distances between captured objects in the 3D scene and 
capturing camera) at the same resolution. Texture map 
from an intermediate virtual viewpoint between any two 
cameras can be synthesized using texture and depth 

2 If the peer group is too large, sub-division into smaller groups for 
independent content sharing is also possible. Our current formulation 
can be easily extended to this case. 



maps of the two camera views (anchor viezvs) via a depth- 
image-based rendering (DIBR) technique like 3D warp- 
ing [3J. Disoccluded pixels in the synthesized view — 
pixel locations that are occluded in the two anchor 
views — can be filled using a depth-based inpainting 
technique like [18]. 

More specifically, denote a virtual viewpoint by u that 
a peer currently requests for observation. We write u as 
u = 1 + |, k = {0,...,(V- 1)K}, for some large In 
other words, u belongs to a discrete set of intermediate 
viewpoints between (and including) captured views 1 
and V, spaced apart by integer multiples of distance 1/K 
(u approaches a continuum as K increases). We consider 
that a distribution function q u describes the fraction of 
peers in the group who currently request virtual view 
u. Any virtual view u can be synthesized using left and 
right anchor views denoted as v l and v r , respectively, 
where v ,v r e r V and v < u < tf. Note that v and v r 
do not have to be the closest captured views to u. The 
distortion of the synthesized view varies with the choices 
of anchor views. Let D u (v , xf) be the distortion function 
of peers requesting virtual view u, which is synthesized 
using v 1 , v r as anchors. 

1) Monotonic Distortion model: A reasonable assump- 
tion on distortion is monotonicity with respect to anchor 
view distance Q. It is not guaranteed that distortion 
always decreases with the distance between reference 
views, but this is true in the vast majority of the settings. 
We hence consider a monotonic distortion model in 
this paper: further-away anchor view does not lead to 
smaller resulting synthesized view distortion: 

D u (v',v' u ) > D u (v,v r u ), W<v<u 
D u (v l u/ v) < D u (v l u ,v'), Vu<v<v' (1) 

C. View-switching Model 

To model the view-switching behavior of peers, we 
consider that a peer with current desired virtual view 
u can switch in the next time instant to any virtual 
views w's with probability P u ,w, an d P is the view- 
transition probability matrix. For example, if a peer stays 
in the current view u — 1 + k/K with probability Q, and 
switches to any one of the two adjacent views with equal 
probability (1 - Q)/2, we have the following transition 
probabilities: 

f Q if w = 1 + k/K 

Pi +m , w = \ (l-Q)/2 if a; = l + (*±l)/K (2) 
I o.w. 

IV. Formulation I: no reconfiguration cost 

In this section, we consider the case where the re- 
configuration cost due to peers' anchor view changes is 
negligible, e.g., peers tend to switch views infrequently, 

3 Though we consider here equally spaced virtual views for ease of 
exposition, our analysis and algorithms can be easily generalized to 
uneven virtual view spacing as well. 



and hence the distribution network does not need to 
be reconfigured often. We now formulate the anchor 
view allocation problem formally as the interactive free- 
viewpoint live streaming (IFLS) problem. 

A. Optimization and System Variables 

We first define the optimization variables, which are 
the same for all our formulations of the problem. Let 
*V' Q *V be a purchased set of captured views selected by 
the peer group to serve as anchor views to synthesize 
virtual views requested. A peer of virtual view u selects 
left and right anchor views, v[ t and v' u from the purchased 
set 'V' to synthesize its desired virtual view u. We 
consider the following anchor view selection constraint: 

v l u <u<v r ur v l u ,v r u e<V' Q<V, Vm (3) 

In words, Equation |3]l states that peer of virtual view u 
must select from r V' the left anchor view v u to the left 
of u (i.e., v l u < u) and right anchor view v' u to the right of 
u (i.e., u < v r u ). The selected anchor views v u , and v r u will 
induce synthesized distortion D u (v ,V r ), as discussed in 
Section UlLBl These are our variables to be optimized. 

There is an access cost to purchase the set f Y" of anchor 
views by the peer group P. If there is a hard connection 
constraint (or cost budget), we have 

m < B max (4) 

One may alternatively consider a soft connection con- 
straint, where the total access cost At t„i for the peer 
group is proportional to the number of anchor views 
purchased, i.e., A toia \ = a |*V|. For now, we are only 
concerned with the access cost of camera views in the 
purchased set r V'; the question of how the cost should be 
fairly distributed to each peer is deferred to Section lVI-CI 
If the connection is modeled as a hard constraint, the 
objective of the IFLS problem is to select a subset f Y" and 
anchor views v u ,v r u e < V' for each virtual view u, so as 
to minimize the aggregate distortion of all peers of all 
virtual views it's, i.e., 

min 2^„D u (i>i,X,), (5) 
~ u 

subject to Constraints © and 0). We label this combi- 
natorial optimization problem as IFLS-H. 

Alternatively, if the connection is modeled as a soft 
constraint, the objective becomes the combination of total 
distortion of all peers of all virtual views m's plus the total 
access cost, 

min Yj^Du&u'V'^+AtoM (6) 

~ U 

subject to Constraint (|3j. We label this problem as IFLS-S. 



B. Algorithm I: DP solution 

Both IFLS-H and IFLS-S can be solved optimally in 
polynomial time via DP. We show here how IFLS-S is 
solved; algorithm for IFLS-H follows similar steps in a 
straight-forward manner, and hence is omitted. 

Define <p(v', u 1 , u r , v r ) as the minimum cost for all peers 
interested in virtual views u e [u 1 , u r ], where v and v r are 
the nearest left and right anchor views that have already 
been purchased. The optimal solution of IFLS-S can be 
found by a call to cp(v \, u\, u r jr v r .), where u\ and u r . are the 
leftmost and rightmost virtual views requested by the 
peer group, and it. and V r . are the corresponding camera 
views just to the left and right of them, i.e., 

v\ = \u\\, u\ = argmin {u} s.t. q u > 0; 

V 'i = [ M fl' «■ = ar S max f"l s - t - c lu>0. (7) 
Given above, (p() can be recursively calculated as 




<p{v',ii',u v -,v) + <p(v,u v+ ,u r ,v r j\}, (8) 

where u l '~ is the virtual view of a peer to the left and 
nearest to new anchor view v (u v ~ < v), and u v+ is the 
virtual view of a peer to the right and nearest to v v < 
u v+ . The loop invariant of Equation ^ is tr < U < u r < v r . 
In words, Equation ||8) states that cp() is the smaller of: 
i) Sum of synthesized distortion of virtual views u's, 
u 1 < u < u r , given that no more anchor views will be 
purchased (and hence V and xf are the best anchor 
views for synthesis of views u e [u l ,u r ]). 
ii) Cost of one more anchor view v, V < v < v r , which is 
the access cost a plus the recursive cost (p() using two 
virtual-view ranges, given by [u 1 , u v ~] and [u v+ , u r ], 
that divide the original range [u ,U r ]. 
The complexity of the solution given by Equation <(Sj 
can be analyzed as follows. Each time Equation |(8) is 
solved for arguments v , u 1 , u r , and v r , they can be stored 
in entry lV][w'][w'][^1 of a DP table O so that any 
subsequent repeated sub-problem can be simply looked 
up. Each computation of Equation <[8j> takes 0(V) steps, 
and the size of the table is 0(V 4 ). This results in run-time 
complexity of 0(V 5 ). 

V. Formulation II: reconfiguration cost 

As the video is played back, a peer may switch his ob- 
servation viewpoint from a virtual view u to a new view 
u' , where u' may fall outside the range [v l u , v r u ] spanned 
by the anchor views v l u and if u . The network hence needs 
to be reconfigured to supply the peer with new anchors. 
If the reconfiguration cost is non-negligible, the peer 
group would tend to choose anchors v l u and v r u that are 
further apart, so that the likelihood of the virtual view 
switching outside the range [v u , V^] is low. In this section, 
we formulate the anchor-view allocation problem with 



reconfiguration costs, termed free-viewpoint live streaming 
with view-switching (FLSV). 

A. Reconfiguration Cost 

We define the reconfiguration cost S u (v' u ,v r u ) as the prob- 
ability that a peer requires new anchor views during the 
next t view-switches, given the current virtual view u 
and the anchor views v\ t and v' u . S u may be computed 
as follows. We first define a sub-matrix P(v u ,v r u ) that 
contains only entries P W/Z 's, where w,z e [v l u ,v r u ], defined 
in Equation ((2). Note that unlike P, the sum of the entries 
in a row in Y{v\v r ) does not need to add up to 1. We 
can write S„ as a simple sum: 

s„(^X) = i-£^>!X), (9) 

w 

where Pl liW {v l u , v r u ) is the entry [m][w] in matrix P T (z>„, v r u ) = 
YVt=i ~P{ v ur v u)r the t— step transition probability. In words, 
Equation ((9) states that the reconfiguration cost S u is 
one minus the probability that the peer stays within the 
range \tr u ,v r ^\ for all t view switches. 

B. Objective Function 

We first consider the server-peer cost as a hard con- 
straint, and formulate the FLSV-H optimization problem. 
The objective is to select a subset 'V' of camera views 
and to select anchor views v l u ,v r u for each virtual view 
u within r V' , in order to minimize the total distortion of 
all peers plus a reconfiguration cost weighted by i.e., 

min ^^(D^tO + ^C^/tO)/ (10) 

u 

subject to Constraints © and @. 

We next consider the connection as a soft constraint. 
The objective then becomes the sum of the distortion, 
reconfiguration cost, plus total access cost, i.e., 

min q u (D u (v l , v r ) + p.S u (v l , v r )) + A M , (H) 
_ u 

subject to Constraint ©. This problem is FLSV-S. 

C. NP-Hardness Proof 

Both FLSV-H and FLSV-S are NP-hard. We present 
the proof of FLSV-H here; the proof of FLSV-S follows 
similar argument and is discussed in the Appendix. 

We show that the well known NP-complete Minimum 
Cover (MC) problem is polynomial-time reducible to a 
special case of FLSV-H. In MC, a collection C of subsets 
of a finite item set S is given. The decision problem is: 
does C contain a cover for S of size at most k, i.e., a subset 
C' QC where \C'\ < k, such that every item in S belongs 
to at least one subset of C? 

Consider a special case of FLSV-H where in the opti- 
mal solution, all peers use the leftmost camera view 1 as 
their left anchor view. This is the case if the synthesized 
distortion for each peer of view u is a local minimum 




u vl v2 

Fig. 1, Cost with different right anchor. 



whenever view 1 is used as left anchor, i.e., D u (l,v r u ) < 
D u (v,v r u ),Vv,v r u . Hence all peers will share view 1 as left 
anchor view, and need to select only right anchor view 
to minimize the aggregate cost in Equation (fTOb . 

We first map items in set S to consecutive virtual 
views m's (each with q u = 1/|»S|) just to the right of 
leftmost captured view 1. We map subsets in collection 
C to captured views v's to the right of the virtual views 
m's. We next construct reconfiguration cost S u (l,v r u ) by 
assuming a view-switching probability Q > in (1) and 
t = 1, resulting in a decreasing S u (l,v[) as function of v r u 
for all virtual views u's, as shown in Figure [TJ 

We first set distortion D„(l, v' u ) for peers of virtual 
views u's such that the aggregate cost is a constant 
a, i.e., D u (l,v T u ) + S u (l,v T u ) = a. Then for each item 
s, in subset Cj, we reset distortion D u (l,v r u ) (of virtual 
view u corresponding to item s, and of anchor view 
v r u corresponding to set cj) to distortion D u (l,v r u - 1) of 
anchor view v r u - 1. Note that the distortion function 
remains monotonically non-decreasing. 

Figure [I] shows an example of the aggregate cost for 
peer of virtual view u, where do is the distortion and S 
is the reconfiguration cost. Note that dO + S = a except 
for xf u = Vi and v r u = Vj. If an optimal solution to FLSV-H 
with constraint Vm = K + 1 has a total cost less than \S\a, 
then the selected camera views will correspond to C in 
MC. Hence MC is a special case of FLSV-H. □ 

VI. Algorithm II: heuristics 

In this section, we present heuristic algorithms to 
address the anchor view selection problem with re- 
configuration cost. We first present a centralized and 
locally optimal algorithm based on Lloyd's algorithm [8J 
in non-uniform scalar quantization. Then we present 
a distributed algorithm with guranteed convergence, 
followed by the fair access cost allocation mechanism. 

A. Local Optimum with Lloyd's Algorithm 

We present here a low-complexity centralized opti- 
mization algorithm that converges to a locally optimal 
solution for FLSV We first observe that for a given subset 
*V' c "V of camera views with a fixed access cost A iota \, a 
peer of virtual view u can independently select v l u and v r u 



from 'V' in order to minimize its own sum of distortion 
and reconfiguration cost given by D u (v l ,v r ) + pS u (v l ,v r ). 
This potentially leads to a better global solution. In 
other words, a solution cannot be globally optimal if a 
peer of a view u can lower his own sum of distortion 
and reconfiguration cost by choosing a different left or 
right anchor views from the same purchased set "V'. We 
formalize this necessary condition for global optimality 
in the following lemma. 

Lemma 1: If 'V', v l u 's and v r u 's are a set of optimal 
variables, then peer(s) of any virtual view u cannot 
switch from a selected left anchor view v = v u to another 
anchor view v' e 'V' and lower the overall cost. □ 

The above Lemma also holds for switching of right 
anchor view to lower overall cost. 

While the first lemma is concerned with switching of 
anchor views within a fixed subset 'V' of camera views, 
we can similarly construct a second Lemma concerning 
a selected camera view v e *V' being replaced by another 
camera view v' £ r V' . 

Lemma 2: If 'V', v l u 's and v r u 's are a set of optimal 
variables, then one cannot replace a selected camera 
view v e <V' with an unselected camera view v' £ 'V', so 
that peers of views u's that currently select camera view 
v as anchor, i.e. v u - v or v\ t — v, switch to v' as anchor, 
and lower overall cost. □ 

These two Lemmas are analogous to the two necessary 
conditions in optimizing non-uniform scalar quantiza- 
tion (SQ). SQ is the problem of quantizing a large 
number of samples in R 1 space into k Voronoi regions 
for compact representation, so that only flog A:"| bits are 
required to represent a sample with minimal distortion. 
The first necessary optimal condition for SQ is that each 
sample can freely select a Voronoi region to represent 
itself, one whose centroid has the minimum distance to 
itself (minimum distortion). This is similar to our first 
Lemma. In the second optimal condition, each Voronoi 
region can freely select a centroid that minimizes the sum 
of distance to all samples in the region. This is similar 
to our second Lemma. 

Due to the similarity of our problem to SQ, we can 
deploy a modified version of the famed Lloyd's algo- 
rithm to solve our problem. We call our algorithm the 
centralized peer grouping (CPG) algorithm. 

For FLSV-H, we first pull the leftmost and rightmost 
camera views from the server, and then a total number 
of Bmax—2 camera views are randomly pulled in between. 
For each peer we calculate the optimal anchor views 
(chosen from B max camera views) that minimizes the sum 
of its distortion and reconfiguration cost. Similar to the 
Lloyd's algorithm, we iteratively adjust the positions of 
Bmax— 2 camera views to reduce the total costs of all peers 
in the group. In each iteration, we go through each one 
of Bmax ~ 2 camera views, calculate the new total costs if 
we shift the camera view one step towards its left and 
right. If the new total cost is lower than the original, we 
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Fig. 2. Coalition of peers. 

substitute the camera view with the one to its left (or 
right). The algorithm stops when the total cost of peers 
cannot be further reduced. It is guaranteed to converge 
since the total cost only decreases in each iteration. 

For FLSV-S, we run the above procedure V — 1 times 
with B max = 2 to V, and then choose the optimal r V' 
that gives us the minimum total cost due to distortion, 
reconfiguration and access. 

B. Distributed Heuristic 

The centralized algorithm presented above is able 
to find a nearly optimal FLSV solution by assigning 
anchor views to each peer. The solution is suitable 
when there is a central controller, and the network is 
not large or highly dynamic (with peer arrivals, many 
view switchings and departures). In this section, we 
present a simple, adaptive and distributed heuristic for 
collaborative sharing of anchor views, or equivalently 
for constructing the overlay P2P network, which scales 
well to large network with peer churns. We call this 
distributed heuristic the distributed peer grouping (DPG) 
algorithm. 

In a peer group, peers watching the same or adjacent 
virtual views are organized into "coalitions". Figure 
shows an example of how the peer coalitions are formed, 
where u u Uj, ...,u,„ are virtual views. Peers watching 
virtual views between u, and Uj are organized into a 
coalition, i.e., Coalition 1. All peers that belong to the 
same coalition share anchor views and thus access costs. 
There is a leader peer (marked in white) in each coalition, 
which keeps track of the number of peers watching each 
virtual view and of the total cost of the whole coalition. 
It periodically exchanges the cost information with both 
neighoring coalitions on each side. Two neighboring 
coalitions may merge into a new bigger coalition, and a 
coalition may also split into two coalitions if the overall 
cost can be reduced. We discuss algorithms for peer joins, 
coalition merge and split, peer leaves and view switching 
in the following. 

Peer Join: When a new peer i arrives, it first contacts 
a Rendezvous Point (RP) that forwards it to the peer 
group that i belongs to. This could be done with an IP 
address lookup. If there is an existing coalition g that 
covers the virtual view peer i requests in the peer group, 
RP connects i with the leader node of the coalition C. 
The node z joins coalition C and starts to pull anchor 
views from other peers in the coalition. The leader peer 



of C updates the cost and information of the coalition. 
However, if the virtual view requested by peer i is not in 
the range of any coalition, a new coalition will be created, 
and i becomes the leader of the coalition. It pulls the 
anchor views from the streaming server that minimizes 
its own costs (distortion and reconfiguration cost). 
Coalition merge: The coalition structure is adaptive to 
peer churns, which keeps the P2P network optimized. 
The leader peers of each coalition periodically exchange 
information with neighboring leaders. Let L\, L2 be the 
cost for CI and C2 respectively and Lm be the optimal 
cost from the result of the CPG algorithm run on C\ U C2 
if CI and C2 merge and cooperate. If Lm < I4 + L2, 
the two coalitions C\ and C2 are merged. Let Vm be 
the optimal set of anchor views returned by the CPG 
algorithm. Each peer i in the merged coalition adapts 
to new anchor views V * and v r * that give the minimum 
cost {v l *,v r * e V m ). The leader who requested the merge 
becomes the new leader of the merged coalition. 
Coalition split: For a big coalitation Cm, the leader peri- 
odically examines whether splitting into two coalitions 
leads to lower cost. Let u m be a virtual view separating 
Cm into two coalitions Cl, Cr. For each different u m , the 
leader runs the CPG algorithm on both Cl and Cr. If the 
combination of optimal costs is smaller than L,„, then 
Cm is split into Cl and Cr, and a new leader will be 
randomly selected for the newly created coalition. 
Peer leave: When a peer z is about to leave, all content 
sharing between i and its neighbors is stopped, and the 
leader node updates the cost of the coalition. If the leader 
node leaves, a new leader is randomly chosen. 
View switch: A peer z could switch the virtual view it 
currently watches in the middle of a streaming session. 
If the new virtual view is still within the range of the 
coalition, peer i can still pull anchor views from other 
peers and synthesize the new view. There will be no 
change of the overlay structure. However, if the new 
virtual view goes out of the range of the coalition, the 
peer will leave the current coalition and join (or create) 
a new coalition. It follows the same process as in the 
situation where peers join or leave the system. 

C. Fair cost allocation within a coalition 

We propose a mechanism to fairly distribute the access 
costs to each peer for the DPG algorithm described in 
section lVI-BI From the above discussion, cooperation en- 
ables peers watching adjacent views to share the anchor 
views and thus the access cost. It helps to reduce the total 
cost of all users. As peers in P2P networks are selfish and 
rational, an important issue in our live free viewpoint 
video streaming problem is the fair allocation of the cost 
among peers in a coalition, so that our solution does not 
only minimize the total cost of the entire P2P network, 
but also helps each user to lower its own cost. As such, 
no user is willing to deviate from the proposed solution, 
and the constructed overlay P2P network is stable. 



Coalitional game theory provides an ideal tool to 
provide fair rules for cost reduction via cooperation in 
our free-viewpoint live streaming problem |[T9l . Consider 
a coalition C = {1,2, ■■■,n) with n peers who watch 
neighboring views and share the anchor views and the 
access cost. Let S c C be a subgroup of users in C 
watching nearby views, where L(S) is the total cost of 
peers in S if they decide to cooperate, with L being 
the cost function defined in fi"R . An allocation vector 
x = [xi,x 2 , ■■ ■ ,x n ] divides the total cost L(C) among its n 
members, where x, is the cost (including view distortion, 
access cost and reconfiguration cost) assigned to user t'O 

Given an allocation x, define the excess of a subgroup 
S c C (with respect to x) as e(S,x) = L(S) - £ ieS x„ which 
is the extra cost incurred to S if they deviate from the 
coalition C and the allocation x but form a coalition S 
themselves. If e(S, x) > 0, the subgroup S has no incentive 
to deviate from the coalition C. For an allocation, if its 
excesses are all non-negative, then users in C have an 
incentive to stay in C, and our goal is to find such stable 
coalitions and allocations. 

Finding such stable allocations is often difficult, and 
a well known fair solution is the nucleolus lfl9ll , 120]. 
The nucleolus always exists and is unique. It maximizes 
the excesses in the non-decreasing order, or equivalently, 
minimizes peers' dissatisfaction in the non-increasing 
order. Moreover, it is one of the stable allocations if 
they exist. The nucleolus is defined as follows. Given 
an allocation x, let O(x) be the vector of all excesses 
{e(S, x), + S + C] sorted in the non-decreasing order. 
The nucleolus /] is the unique allocation that lexico- 
graphically maximizes <5 over all allocations, that is, 

To compute the nucleolus, we follow the above defini- 
tion and solve a sequence of linear programs as follows 
Il20l . We first solve the following problem 

(LPi) max € 

£x,=L(C), 

ieC 

xi < L(S) - e, VS + 0, S + C, (12) 

ieS 

which adds constraints on the allocation vectors x to 
maximize the smallest excess. Let e\ be the optimal 
solution of (LP\), which is the maximal smallest excess, 
and let Si be the collection of all subgroups whose 
excesses are equal to e\. We then solve 

(LP 2) max e 

4 Note that from Section lTlI-Bl and Equation (9), users' view distortion 
and reconfiguration costs are fixed once the set of anchor views is 
selected, and only their access costs can be adjusted to achieve fairness 
among peers in a coalition. In our work, given the desired allocation 
x, we adjust users' access costs to ensure that user ;'s total cost is x;. 

5 A vector a is said to be lexicographically larger than vector b (a >i ex 
b) if in the first component that they differ, that component of a is larger 
than that of b. 



ieC 

Xi = L(S) - e x , VS e Si, 

ieS 

Xi < L(S) - e, otherwise, (13) 

ieS 

which maximizes the second smallest excess. We con- 
tinue this way until there is only one allocation x that 
satisfies all the constraints in the optimal solution, and 
that allocation is the nucleolus. 

In DPG, we apply the above procedure to compute the 
nucleolus for each coalition found by the algorithm. 

VII. Experimentation 

In this section we present illustrative simulation re- 
sults. In simulations, we assume the distortion function 
D u has the following form: 

D U (V 1 , V r ) = ye a " {v '-' J) ( e /Vmin(«-t,y -«) _ ^ _ ( 14 ) 

Note that if virtual view u is actually one of the anchor 
views, then the distortion D u is zero. The rate at which 
the distortion increases with the distance between anchor 
views, depends on the parameters a„ and j3„. 

Unless otherwise stated, we use the following baseline 
parameters in our simulation: number of captured views: 
21, number of virtual views: 200, number of peers: 10000, 
co = 0.4, t = 6, a = 5. We assume that the distribution of 
peers watching each virtual view follows a normal dis- 
tribution. We have also run our simulations on different 
peer distributions. The results of those simulations are 
qualitatively the same as what is presented here, and 
hence are not shown for the sake of brevity. 

A. Results for Negligible Reconfiguration Cost 

We compare the DP-based optimal solution with a 
simple P2P approach for solving the IFLS problem. In the 
latter simple P2P approach, peers independently choose 
the anchor views that minimize their own distortion. 
The access costs of each anchor view are shared by all 
users that request it. There is no collaboration on anchor 
selections among peers. 

Figure |3] shows the total cost (distortion plus access 
costs) for the peers as a function of the price of camera 
views. It is shown that our CPG algorithm gives much 
better results than the simple P2P approach, especially 
when the price is high. This is because, in the DP algo- 
rithm, the peers can collaboratively select and share the 
same anchor views to reduce the access cost, with a small 
price in distortion penalty. Therefore, fewer captured 
views are pulled from the server, and the total cost is 
minimized. 

B. Results for Non-Negligible Reconfiguration Cost 

We carried out simulation to evaluate the performance 
of our proposed CPG and DPG algorithms with the 
optimal solution (Optimal), and the simple P2P approach. 
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Fig. 3. Total cost versus price of captured views. Fig. 4. Total cost versus price of captured views, pjg 5 Number of captured views pulled. 



The optimal solution is obtained through exhaustive 
search. The simple P2P approach is similar to the one 
we used in IFLS except that peers choose anchor views 
to minimize their own total cost. 

Figure S] shows the total cost of all peers versus the 
price of a captured view. It is shown in the figure that 
the total cost increases with the price of a camera view. 
This is because a higher view price leads to a higher 
access cost, and peers tend to share the same anchor 
views with others so they can share the cost of common 
anchor views from the streaming server. This, in turn, 
increases other cost components, i.e., distortion and re- 
configuration costs. From the figure, we see that CPG 
performs very close to the global optimal solution. The 
anchor views can successfully adapt to good positions 
to minimize the total costs of all peers. DPG is also 
very efficient in reducing the total cost, especially when 
the price of a captured view is high. DPG does not 
outperform simple P2P when the view price is low due 
to the lack of global information. 

Figure [5] shows the total number of views pulled from 
the streaming server as a function of access cost of an 
anchor view. The number drops with the increase in the 
price of access cost. When requesting a captured view 
from the streaming server becomes expensive, in order 
to reduce their access costs, peers tend to seek more 
cooperation by using the same anchor views and sharing 
the access cost. Therefore, the total number of camera 
views pulled from the streaming server becomes smaller. 
In DPG, the total number of views pulled could be 
higher than the total number of camera views since peers 
only share the access costs within the same coalition, and 
a captured view could be pulled multiple times by peers 
from different coalitions. 

Figure [6] shows the number of coalitions formed by 
Heuristics algorithm. The number of coalitions drops 
with the price of a captured view. When the anchor 
views are expensive, neighboring coalitions are more 
likely to merge into a bigger one so that the access 
costs could be shared by more peers. The Heuristics can 
efficiently re-arrange the topology to minimize the total 



cost when the view prices changes. 

Figure [7] shows the total cost of all peers versus peer 
population. The total cost increases with the number 
of peers. Simple P2P performs the worst. It has very 
high total cost even when the number of peers is low. 
This is due to the lack of collaboration in peer anchor 
selections. DPG and CPG achieve close-to-optimal per- 
formance. When there are fewer peers in the system, 
they tend to use same anchor views to reduce access 
cost, with a penalty in other cost components. When 
the peer population increases, each peer can choose 
better anchor views that leads to a lower distortion and 
reconfiguraiton cost, since there will be more neighbors 
to share the access costs. 

Figure |8] shows the cost components of CPG algorithm. 
With the increase of view price, access cost becomes the 
major component of the total cost. Distortion and recon- 
figuration costs also increase because peers compromise 
to suboptimal anchor views (in terms of distortion and 
reconfiguration) so that their access costs can be shared 
with a larger crowd. The cost components of DPG are 
qualitatively the same as CPG, and hence are not shown 
for brevity. 

VIII. Conclusion 

In this paper we study the design and optimization of 
interactive P2P streaming of live free viewpoint video. In 
free viewpoint live streaming, peers could select different 
virtual viewpoints, which are synthesized using texture 
and depth videos of the anchor views captured by 
multiple cameras. The access cost of common anchor 
views are collectively shared by peers with a price 
of higher distortion. We formulate two problems, IFLS 
with negligible reconfiguration cost, and FLSV with 
none-negligible reconfiguration cost. Then we provide 
a DP-based optimal solution for IFLS, and heuristic 
algorithms for FLSV. The simulation results show that 
our proposed algorithms achieve respective optimal and 
close-to-optimal performance in terms of total cost, and 
substantially outperform a P2P scheme without collabo- 
rative anchor selection. 
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Appendix 

We prove that FLSV-S is also NP-hard, by reducing the NP- 
complete MC problem to a special case of FLSV-S. Following 
similar construction in the proof for FLSV-H, we first map 
items in set S to virtual views u's (each with q„ = l/\S\) to 
the right of leftmost captured view 1, and map subsets in 
collection C to captured views v's to the right of the virtual 
views. Consider again the case where the optimal solution has 
all peers sharing view 1 as their left anchor. 

We construct reconfiguration cost S u (l,v r u ) as done in the 
FLSV-H proof. Next, we identify the smallest S u (l,v) for all u's 
and v's for which u and v correspond to an item and a subset in 
original MC problem, respectively. Let 5 = S„(l, v — 1) — S„(l, v). 
We then construct D u (l,v) to be 1 - S u (l,v) - 6 if the subset 
corresponding to v contains the item corresponding to u, and 
1 - S„(l, v) otherwise. That means that a virtual view covered 
by a camera view v will have a decrease of 5 in distortion. 
Note that by definition of b, D u (l,v) is monotonically non- 
decreasing. Finally, we define the access cost a = 5/(\C\ + 1), 
which means that purchasing all the captured views v's is 
cheaper than paying for 5 for a virtual view u uncovered by a 
captured view v. 

We now claim that, if the optimal solution to FLSV-S has 
access cost smaller than k6/(\C\+V), then the corresponding MC 
decision problem is positive, and vice versa. The reason is the 
following. Under the above construction, FLSV-S can always 
find a solution that covers all virtual views u's (items in MC) 
with camera views v's. If the minimum cost solution requires k 
or fewer captured views, then the corresponding subsets will 
cover all items in C in MC. 



